316 research outputs found

    Boundary State Generation for Testing and Improvement of Autonomous Driving Systems

    Full text link
    Recent advances in Deep Neural Networks (DNNs) and sensor technologies are enabling autonomous driving systems (ADSs) with an ever-increasing level of autonomy. However, assessing their dependability remains a critical concern. State-of-the-art ADS testing approaches modify the controllable attributes of a simulated driving environment until the ADS misbehaves. Such approaches have two main drawbacks: (1) modifications to the simulated environment might not be easily transferable to the in-field test setting (e.g., changing the road shape); (2) environment instances in which the ADS is successful are discarded, despite the possibility that they could contain hidden driving conditions in which the ADS may misbehave. In this paper, we present GenBo (GENerator of BOundary state pairs), a novel test generator for ADS testing. GenBo mutates the driving conditions of the ego vehicle (position, velocity and orientation), collected in a failure-free environment instance, and efficiently generates challenging driving conditions at the behavior boundary (i.e., where the model starts to misbehave) in the same environment. We use such boundary conditions to augment the initial training dataset and retrain the DNN model under test. Our evaluation results show that the retrained model has up to 16 higher success rate on a separate set of evaluation tracks with respect to the original DNN model

    Testing of Deep Reinforcement Learning Agents with Surrogate Models

    Full text link
    Deep Reinforcement Learning (DRL) has received a lot of attention from the research community in recent years. As the technology moves away from game playing to practical contexts, such as autonomous vehicles and robotics, it is crucial to evaluate the quality of DRL agents. In this paper, we propose a search-based approach to test such agents. Our approach, implemented in a tool called Indago, trains a classifier on failure and non-failure environment (i.e., pass) configurations resulting from the DRL training process. The classifier is used at testing time as a surrogate model for the DRL agent execution in the environment, predicting the extent to which a given environment configuration induces a failure of the DRL agent under test. The failure prediction acts as a fitness function, guiding the generation towards failure environment configurations, while saving computation time by deferring the execution of the DRL agent in the environment to those configurations that are more likely to expose failures. Experimental results show that our search-based approach finds 50% more failures of the DRL agent than state-of-the-art techniques. Moreover, such failures are, on average, 78% more diverse; similarly, the behaviors of the DRL agent induced by failure configurations are 74% more diverse

    Search based path and input data generation for web application testing

    Get PDF
    Test case generation for web applications aims at ensuring full coverage of the navigation structure. Existing approaches resort to crawling and manual/random input generation, with or without a preliminary construction of the navigation model. However, crawlers might be unable to reach some parts of the web application and random input generation might not receive enough guidance to produce the inputs needed to cover a given path. In this paper, we take advantage of the navigation structure implicitly specified by developers when they write the page objects used for web testing and we define a novel set of genetic operators that support the joint generation of test inputs and feasible navigation paths. On a case study, our tool Subweb was able to achieve higher coverage of the navigation model than crawling based approaches, thanks to its intrinsic ability of generating inputs for feasible paths and of discarding likely infeasible paths

    Generating and Detecting True Ambiguity: A Forgotten Danger in DNN Supervision Testing

    Full text link
    Deep Neural Networks (DNNs) are becoming a crucial component of modern software systems, but they are prone to fail under conditions that are different from the ones observed during training (out-of-distribution inputs) or on inputs that are truly ambiguous, i.e., inputs that admit multiple classes with nonzero probability in their labels. Recent work proposed DNN supervisors to detect high-uncertainty inputs before their possible misclassification leads to any harm. To test and compare the capabilities of DNN supervisors, researchers proposed test generation techniques, to focus the testing effort on high-uncertainty inputs that should be recognized as anomalous by supervisors. However, existing test generators aim to produce out-of-distribution inputs. No existing model- and supervisor independent technique targets the generation of truly ambiguous test inputs, i.e., inputs that admit multiple classes according to expert human judgment. In this paper, we propose a novel way to generate ambiguous inputs to test DNN supervisors and used it to empirically compare several existing supervisor techniques. In particular, we propose AmbiGuess to generate ambiguous samples for image classification problems. AmbiGuess is based on gradient-guided sampling in the latent space of a regularized adversarial autoencoder. Moreover, we conducted what is -- to the best of our knowledge -- the most extensive comparative study of DNN supervisors, considering their capabilities to detect 4 distinct types of high-uncertainty inputs, including truly ambiguous ones. We find that the tested supervisors' capabilities are complementary: Those best suited to detect true ambiguity perform worse on invalid, out-of-distribution and adversarial inputs and vice-versa.Comment: Accepted for publication at Springers "Empirical Software Engineering" (EMSE

    Assessment of Source Code Obfuscation Techniques

    Get PDF
    Obfuscation techniques are a general category of software protections widely adopted to prevent malicious tampering of the code by making applications more difficult to understand and thus harder to modify. Obfuscation techniques are divided in code and data obfuscation, depending on the protected asset. While preliminary empirical studies have been conducted to determine the impact of code obfuscation, our work aims at assessing the effectiveness and efficiency in preventing attacks of a specific data obfuscation technique - VarMerge. We conducted an experiment with student participants performing two attack tasks on clear and obfuscated versions of two applications written in C. The experiment showed a significant effect of data obfuscation on both the time required to complete and the successful attack efficiency. An application with VarMerge reduces by six times the number of successful attacks per unit of time. This outcome provides a practical clue that can be used when applying software protections based on data obfuscation.Comment: Post-print, SCAM 201

    Using multi-locators to increase the robustness of web test cases

    Get PDF
    The main reason for the fragility of web test cases is the inability of web element locators to work correctly when the web page DOM evolves. Web elements locators are used in web test cases to identify all the GUI objects to operate upon and eventually to retrieve web page content that is compared against some oracle in order to decide whether the test case has passed or not. Hence, web element locators play an extremely important role in web testing and when a web element locator gets broken developers have to spend substantial time and effort to repair it. While algorithms exist to produce robust web element locators to be used in web test scripts, no algorithm is perfect and different algorithms are exposed to different fragilities when the software evolves. Based on such observation, we propose a new type of locator, named multi-locator, which selects the best locator among a candidate set of locators produced by different algorithms. Such selection is based on a voting procedure that assigns different voting weights to different locator generation algorithms. Experimental results obtained on six web applications, for which a subsequent release was available, show that the multi-locator is more robust than the single locators (about -30% of broken locators w.r.t. the most robust kind of single locator) and that the execution overhead required by the multiple queries done with different locators is negligible (2-3% at most)

    Deep Reinforcement Learning for Black-Box Testing of Android Apps

    Full text link
    The state space of Android apps is huge and its thorough exploration during testing remains a major challenge. In fact, the best exploration strategy is highly dependent on the features of the app under test. Reinforcement Learning (RL) is a machine learning technique that learns the optimal strategy to solve a task by trial and error, guided by positive or negative reward, rather than by explicit supervision. Deep RL is a recent extension of RL that takes advantage of the learning capabilities of neural networks. Such capabilities make Deep RL suitable for complex exploration spaces such as the one of Android apps. However, state of the art, publicly available tools only support basic, tabular RL. We have developed ARES, a Deep RL approach for black-box testing of Android apps. Experimental results show that it achieves higher coverage and fault revelation than the baselines, which include state of the art RL based tools, such as TimeMachine and Q-Testing. We also investigated qualitatively the reasons behind such performance and we have identified the key features of Android apps that make Deep RL particularly effective on them to be the presence of chained and blocking activities

    Why Creating Web Page Objects Manually if It Can Be Done Automatically?

    Get PDF
    Page Object is a design pattern aimed at making web test scripts more readable, robust and maintainable. The effort to manually create the page objects needed for a web application may be substantial and unfortunately existing tools do not help web developers in such task.In this paper we present APOGEN, a tool for the automatic generation of page objects for web applications. Our tool automatically derives a testing model by reverse engineering the target web application and uses a combination of dynamic and static analysis to generate Java page objects for the popular Selenium WebDriver framework. Our preliminary evaluation shows that it is possible to use around 3/4 of the automatic page object methods as they are, while the remaining 1/4 need only minor modifications

    Simulation-based test case generation for unmanned aerial vehicles in the neighborhood of real flights

    Get PDF
    Unmanned aerial vehicles (UAVs), also known as drones, are acquiring increasing autonomy. With their commercial adoption, the problem of testing their functional and non-functional, and in particular their safety requirements has become a critical concern. Simulation-based testing represents a fundamental practice, but the testing scenarios considered in software-in-the-loop testing may not be representative of the actual scenarios experienced in the field. In this paper, we propose SURREAL (teSting Uavs in the neighboRhood of REAl fLights), a novel search-based approach that analyses logs of real UAV flights and automatically generates simulation-based tests in the neighborhood of such real flights, thereby improving the realism and representativeness of the simulation-based tests. This is done in two steps: first, SURREAL faithfully replicates the given UAV flight in the simulation environment, generating a simulation-based test that mirrors a pre-logged real-world behavior. Then, it smoothly manipulates the replicated flight conditions to discover slightly modified flight scenarios that are challenging or trigger misbehaviors of the UAV under test in simulation. In our experiments, we were able to replicate a real flight accurately in the simulation environment and to expose unstable and potentially unsafe behavior in the neighborhood of a flight, which even led to crashes

    Automated Test Case Generation as a Many-Objective Optimisation Problem with Dynamic Selection of the Targets

    Get PDF
    The test case generation is intrinsically a multi-objective problem, since the goal is covering multiple test targets (e.g., branches). Existing search-based approaches either consider one target at a time or aggregate all targets into a single fitness function (whole-suite approach). Multi and many-objective optimisation algorithms (MOAs) have never been applied to this problem, because existing algorithms do not scale to the number of coverage objectives that are typically found in real-world software. In addition, the final goal for MOAs is to find alternative trade-off solutions in the objective space, while in test generation the interesting solutions are only those test cases covering one or more uncovered targets. In this paper, we present DynaMOSA (Dynamic Many-Objective Sorting Algorithm), a novel many-objective solver specifically designed to address the test case generation problem in the context of coverage testing. DynaMOSA extends our previous many-objective technique MOSA (Many-Objective Sorting Algorithm) with dynamic selection of the coverage targets based on the control dependency hierarchy. Such extension makes the approach more effective and efficient in case of limited search budget. We carried out an empirical study on 346 Java classes using three coverage criteria (i.e., statement, branch, and strong mutation coverage) to assess the performance of DynaMOSA with respect to the whole-suite approach (WS), its archive-based variant (WSA) and MOSA. The results show that DynaMOSA outperforms WSA in 28% of the classes for branch coverage (+8% more coverage on average) and in 27% of the classes for mutation coverage (+11% more killed mutants on average). It outperforms WS in 51% of the classes for statement coverage, leading to +11% more coverage on average. Moreover, DynaMOSA outperforms its predecessor MOSA for all the three coverage criteria in 19% of the classes with +8% more code coverage on average
    • …
    corecore